HyDRA: gene prioritization via hybrid distance-score rank aggregation

نویسندگان

  • MinJi Kim
  • Farzad Farnoud
  • Olgica Milenkovic
چکیده

UNLABELLED Gene prioritization refers to a family of computational techniques for inferring disease genes through a set of training genes and carefully chosen similarity criteria. Test genes are scored based on their average similarity to the training set, and the rankings of genes under various similarity criteria are aggregated via statistical methods. The contributions of our work are threefold: (i) first, based on the realization that there is no unique way to define an optimal aggregate for rankings, we investigate the predictive quality of a number of new aggregation methods and known fusion techniques from machine learning and social choice theory. Within this context, we quantify the influence of the number of training genes and similarity criteria on the diagnostic quality of the aggregate and perform in-depth cross-validation studies; (ii) second, we propose a new approach to genomic data aggregation, termed HyDRA (Hybrid Distance-score Rank Aggregation), which combines the advantages of score-based and combinatorial aggregation techniques. We also propose incorporating a new top-versus-bottom (TvB) weighting feature into the hybrid schemes. The TvB feature ensures that aggregates are more reliable at the top of the list, rather than at the bottom, since only top candidates are tested experimentally; (iii) third, we propose an iterative procedure for gene discovery that operates via successful augmentation of the set of training genes by genes discovered in previous rounds, checked for consistency. MOTIVATION Fundamental results from social choice theory, political and computer sciences, and statistics have shown that there exists no consistent, fair and unique way to aggregate rankings. Instead, one has to decide on an aggregation approach using predefined set of desirable properties for the aggregate. The aggregation methods fall into two categories, score- and distance-based approaches, each of which has its own drawbacks and advantages. This work is motivated by the observation that merging these two techniques in a computationally efficient manner, and by incorporating additional constraints, one can ensure that the predictive quality of the resulting aggregation algorithm is very high. RESULTS We tested HyDRA on a number of gene sets, including autism, breast cancer, colorectal cancer, endometriosis, ischaemic stroke, leukemia, lymphoma and osteoarthritis. Furthermore, we performed iterative gene discovery for glioblastoma, meningioma and breast cancer, using a sequentially augmented list of training genes related to the Turcot syndrome, Li-Fraumeni condition and other diseases. The methods outperform state-of-the-art software tools such as ToppGene and Endeavour. Despite this finding, we recommend as best practice to take the union of top-ranked items produced by different methods for the final aggregated list. AVAILABILITY AND IMPLEMENTATION The HyDRA software may be downloaded from: http://web.engr.illinois.edu/∼mkim158/HyDRA.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trapezoidal intuitionistic fuzzy prioritized aggregation operators and application to multi-attribute decision making

In some multi-attribute decision making (MADM) problems, various relationships among the decision attributes should be considered. This paper investigates the prioritization relationship of attributes in MADM with trapezoidal intuitionistic fuzzy numbers (TrIFNs). TrIFNs are a special intuitionistic fuzzy set on a real number set and have the better capability to model ill-known quantities. Fir...

متن کامل

On the hardness of maximum rank aggregation problems

The rank aggregation problem consists in finding a consensus ranking on a set of alternatives, based on the preferences of individual voters. The alternatives are expressed by permutations, whose pairwise distance can be measured in many ways. In this work we study a collection of distances, including the Kendall tau, Spearman footrule, Minkowski, Cayley, Hamming, Ulam, and related edit distanc...

متن کامل

A Meta-Analysis Based Method for Prioritizing Candidate Genes Involved in a Pre-specific Function

The identification of genes associated with a given biological function in plants remains a challenge, although network-based gene prioritization algorithms have been developed for Arabidopsis thaliana and many non-model plant species. Nevertheless, these network-based gene prioritization algorithms have encountered several problems; one in particular is that of unsatisfactory prediction accura...

متن کامل

Rank aggregation methods comparison: A case for triage prioritization

This paper seeks to test and to determine a suitable aggregation method to represent a set of rankings made by individual decision makers (DMs). A case study for triage prioritization is used to test the aggregation methods. The triage is a decision-making process with which patients are prioritized according to their medical condition and chance of survival on arrival at the emergency departme...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 31 7  شماره 

صفحات  -

تاریخ انتشار 2015